Approximating the influence of a monotone Boolean function in 

O(v^) query complexity 



Dana Ron* Ronitt Rubinfeld Muli Safra^ Omri Weinstein 

January 28, 2011 



Abstract 

The Total Influence {Average Sensitivity) of a discrete function is one of its fundamental mea- 
sures. We study the problem of approximating the total influence of a monotone Boolean function 
/ : {0,1}" {0,1}, which we denote by /[/]. We present a randomized algorithm that ap- 
proximates the influence of such functions to within a multiplicative factor of (1 ± e) by performing 

O ^ ^|^^" poly(l/e)^ queries. We also prove a lower bound of Vt on the query complexity 

of any constant-factor approximation algorithm for this problem (which holds for /[/] = ri(l)), hence 
showing that our algorithm is almost optimal in terms of its dependence on n. For general functions we 

give a lower bound of Vl i^jg]^ ' which matches the complexity of a simple sampling algorithm. 
Keywords: influence of a Boolean function, sublinear approximation algorithms, random walks. 
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1 Introduction 



The influence of a function, first introduced by Ben-Or and Linial |2J in the context of "collective coin- 
flipping", captures the notion of the sensitivity of a multivariate function. More precisely, for a Boolean 

function / : {0,1}" — {0,1}, the individual influence of coordinate i on / is defined as =^ 
Pi'xe{o,i}" 7^ /(^^®*^)]> where x is selected uniformljQin {0, 1}*^ and x^®*^ denotes x with the i"^ 
bit flipped. The total influence of a Boolean function / (which we simply refer to as the influence of /) is 

The study of the influence of a function and its individual influences (distribution) has been the focus 
of many papers (|Ilini|7l[l6l[3l[ll[35llll|6l[Il|28l[nito mention a few - for a survey see lUTl ). 
The influence of functions has played a central role in several areas of computer science. In particular, 
this is true for distributed computing (e.g., ||2l|2Tl)> hardness of approximation (e.g., |[T2l l22l ). leai^ning 
theory (e.g., ifTSl l9l l29l [30l [TOllfl and property testing (e.g., |[l3lll|3|2i[3ll)- The notion of influence also 
arises naturally in the context of probability theory (e.g., 1321 [33l O), game theory (e.g., |[24l ). rehabihty 
theory (e.g., 1231), as well as theoretical economics and political science (e.g., |[n [T9ll20l ). 

Given that the influence is such a basic measure of functions and it plays an important role in many areas, 
we believe it is of interest to study the algorithmic question of approximating the influence of a function as 
efficiently as possible, that is by querying the function on as few inputs as possible. Specifically, the need 
for an efficient approximation for a function's influence might arise in the design of sublinear algorithms, 
and in particular property testing algorithms. 

As we show, one cannot improve on a standard sampling argument for the problem of estimating the 
influence of a general Boolean function, which requires queries to the function, for any constant 

multiplicative estimation factor^ This fact justifies the study of subclasses of Boolean functions, among 
which the family of monotone functions is a very natural and central one. Indeed, we show that the special 
structure of monotone functions implies a useful behavior of their influence, and thus the computational 
problem of approximating the influence of such functions becomes significantly easier. 



1.1 Our results and techniques 

We present a randomized algorithm that approximates the influence of a monotone Boolean function to 
within any multiplicative factor of (1 it e) in O ^■^^^^|jp^poly(l/e)^ expected query complexity. We also 

prove an almost matching lower bound of Vti -. — ) on the query complexity of any constant-factor 



.log 

approximation algorithm for this problem (which holds for /[/] = ri(l)). 

As noted above, the influence of a function can be approximated by sampling random edges (i.e., pairs 
(x, x*^®*)) that differ on a single coordinate) from the {0, 1}" lattice. A random edge has probability to 
be influential (i.e, satisfy /(x) ^ /(x^®*))), so a standard sampling argument implies that it suffices to ask 
0( jjjjpoly(l/e)) queries in order to approximate this probability to within (1 it e)0 



'The influence can be defined with respect to other probability spaces (as well as for non-Boolean functions), but we focus on 
the above definition. 

"Here we referenced several works in which the influence appears explicitly. The influence of variables plays an implicit role in 
many learning algorithms, and in particular those that build on Fourier analysis, beginning with 1251 . 

If one wants an additive error of e, then!^((n/e)2) queries are necessary (when the influence is large) 127 1 . 
■*We also note that in the case of monotone functions, the total influence equals twice the sum of the Fourier coefficients 
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In order to achieve better query complexity, we would like to increase the probability of hitting an 
influential edge in a single trial. The algorithm we present captures this intuition, by taking random walks 
down the {0, 1}" latticqj, and then averaging the total number of influential edges encountered in all walks 
over the number of walks taken. The crucial observation on which the algorithm relies, is that a monotone 
function can have at most one influential edge in a single path, and thus it is sufficient to query only the start 
and end points of the walk to determine whether any influential edge was traversed. 

Before continuing the technical discussion concerning the algorithm and its analysis, we make the fol- 
lowing more conceptual note. Random walks have numerous applications in Computer Science as they are 
an important tool for mixing and sampling almost uniformly. In our context, where the walk is performed 
on the domain of an unknown function, it is used for a different purpose. Namely, by querying only the two 
endpoints of a random walk (starting from a uniformly sampled element) we (roughly) simulate the process 
of taking a much larger sample of elements. 

The main issue that remains is determining the length of the walk, which we denote by w. Let Pw{f) 
denote the probability that a walk of length w (down the lattice and from a uniformly selected starting point) 
passes through some influential edgej§ We are interested in analyzing how Pw{f) increases as a function of 
w. We show that for w that is 0{ey^n/ logn), the value of Pw{f) increases almost linearly with w. Namely, 
it is (1 lb e) • ^ • /[/]. Thus, by taking w to be ©(e-^n/ log n) we get an improvement by a factor of roughly 
y/n on the basic sampling algorithm. We note though that by taking w to be larger we cannot ensure in 
general the same behavior of as a function of w and /[/], since the behavior might vary significantly 

depending on /. 

The way we prove the aforementioned dependence of Pw{f ) on w is roughly as follows. For any edge 
e in the Boolean lattice, let Pw{e) denote the probability that a walk of length w (as defined above) passes 
through e. By the observation made previously, that a monotone function can have at most one influential 
edge in a given path, Pw{f) is the sum of Pw{e), taken over all edges e that are influential with respect to /. 
For our purposes it is important that Pw{e) be roughly the same for almost all edges. Otherwise, different 
functions that have the same number of influential edges, and hence the same influence /[/], but whose 
influential edges are distributed differently in the Boolean lattice, would give different values for Pw{f)- 
We show that for w = 0(eyn/logn), the value of pyj{e) increases almost linearly with w for all but a 
negligible fraction of the influential edges (where 'negligible' is with respect to /[/]). This implies that 
Pw{f) grows roughly linearly in w for w = 0{e^/n/ logn). 

To demonstrate the benefit of taking walks of length 0{y/n), let us consider the classic example of the 
Majority function on n variables. Here, all influential edges ai^e concentrated in the exact middle levels of 
the lattice (i.e, all of them are of the form (x, x^®*)) where the Hamming weight of x is [n/2j or [n/2]). 
The probability, Pw{e), of a walk of length w passing through an influential edge e is simply the probability 
of starting the walk at distance at most w above the threshold n/2. Thus, taking longer walks allows us, 
so to speak, to start our walk from a higher point in the lattice, and still hit an influential edge. Since the 
probability of a uniformly chosen point to fall in each one of the the first y/n levels above the middle is 
roughly the same, the probability of hitting an influential edge in that case indeed grows roughly linearly 

that coiTespond to singleton sets {i}, i G {1, . . . ,n}. Therefore, it is possible to approximate the influence of a function by 
approximating this sum, which equals ^ ■ (j2x€{o,i}'^:x,^i fi^) ~ ^xe{o,i}":x,=o /(s^)) • However, the direct sampling 

approach for such an approximation again requires Q{n/I[f]) samples. 

'That is, starting from a randomly selected point in {0, 1}", at each step, if the current point is x, we uniformly select an index 
i such that Xi = 1 and continue the walk to x^^'^K 

*For technical reasons we actually consider a slightly different measure than Pw{f), but we ignore this technicality in the 
introduction. 
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in the size of the walk. Nevertheless, taking walks of length which significantly exceeds 0{^/n) (say, even 
i}{^n • log(n))) would add negligible contribution to that probability (as this contribution is equivalent to 
the probability of a uniformly chosen point to deviate Q{y/n ■ log(n)) levels from the middle level) and thus 
the Unear dependence on the length of the walk is no longer preserved. 

2 Preliminaries 

In the introduction we defined the influence of a function as the sum, over all its variables, of their individual 
influence. An equivalent definition is that the influence of a function / is the expected number of sensitive 
coordinates for a random input x G {0, 1}" (that is, those coordinates i for which f{x) ^ f{x^®^^)). 

It will occasionally be convenient to view / as a 2-coloring of the Boolean lattice. Under this setting, 
any "bi-chromatic" edge, i.e, an edge {x,x^®^^) such that f{x) / /(x^®*)), will be called an influential 
edge. The number of influential edges of a Boolean function / is 2"~^ • 

We consider the standard partial order '-<' over the (n-dimensional) Boolean lattice. Namely, for x = 
{xi, ...,Xn),y = {yi, Vn) S {0, 1}", we use the notation x -< y to mean that Xi < yi for every 1 < i < n, 
and Xi < yi for some 1 < i < n. A Boolean function / : {0, 1}" — {0, 1} is said to be monotone 
if f{x) < f{y) for all x ^ y. A well known isoperimetric inequality implies that any monotone Boolean 
function satisfies /[/] = 0{^/n) (see |[T6i for a proof). This bound is tight for the notable Majority function. 

In this paper we deal mainly with monotone Boolean functions that have at least constant Influence 
(i.e, /[/] > c, for some c > 0), since the computational problem we study arises more naturally when the 
function has some significant sensitivity. As shown in |[2TI . the influence of a function is lower bounded 
by 4 • Pr[/ = 1] • Pr[/ = 0], and so our analysis holds in particular for functions that are not too biased 
(relatively balanced). 

Notations. We use the notation /(n) = 0{g{n)) if /(n) = 0((7(n)polylog((7(n))). Similarly, /(n) = 
0(5(n)) if /(n) = n{g{n) /polylogigin))). 

3 The Algorithm 

As noted in the introduction, we can easily get a (1 it e)-factor estimate of the influence with high constant 
probability by uniformly sampling (^yj^j • pairs (edges in the Boolean lattice), querying 

the function on these pairs, and considering the fraction of influential edges observed in the sample. We refer 
to this as the direct sampling approach. However, since we are interested in an algorithm whose complexity 
is ■ poly(l/e) we take a different approach. To be precise, the algorithm we describe works for e that 
is above a certain threshold (of the order of ^log n/n). However, if e is smaller, then • e^^ is upper 
bounded by ■ poly(l/ e), and we can take the direct sampling approach. Thus we assume from this point 
on that e = uj{y^\ogn/n). 

As discussed in the introduction, instead of considering neighboring pairs, {x, x^®*)), we consider pairs 
{v, u) such that v >- u and there is a path down the lattice of length roughly e^/n between v and u. Observe 
that since the function / is monotone, if the path (down the lattice) from v to u contains an influential edge, 

'To verify this, observe that when partitioning the Boolean lattice into two sets with respect to a coordinate i, we end up with 
2"~^ vertices in each set. The individual influence of variable i, Ii[f], is the fraction of the "bi-chromatic" edges among all edges 
crossing the cut. Since /[/] — X^r^i ^' [/] S^t that the total number of influential edges is 2"~^ ' Ilf]- 
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then f{v) 7^ f{u), and furthermore, any such path can contain at most one influential edge. The intuition is 
that since we "can't afford" to detect influential edges directly, we raise our probability of detecting edges 
by considering longer paths. 

In our analysis we show that this intuition can be formalized so as to establish the correctness of the 
algorithm. We stress that when considering a path, the algorithm only queries its endpoints, so that it 
"doesn't pay" for the length of the path. The precise details of the algorithm are given in Figure [T] When 
we say that we take a walk of a certain length w down the Boolean lattice with a cut-off at a certain level £, 
we mean that we stop the walk (before taking all w steps) if we reach a point in level £ (i.e., with Hamming 
weight £). 

Note that m, the number of walks taken, is a random variable. Namely, the algorithm continues taking 
new walks until the number of "successful" walks (that is, walks that pass through an influential edge) 
reaches a certain threshold, which is denoted by t. The reason for doing this, rather than deterministically 
setting the number of walks and considering the random variable which is the number of successful walks, is 
that the latter approach requires to know a lower bound on the influence of /. While it is possible to search 
for such a lower bound (by working iteratively in phases and decreasing the lower bound on the influence 
between phases) our approach yields a somewhat simpler algorithm. 

Algorithm 1: Approximating the Influence (given e, 6 and oracle access to /) 

1. Set e = e/4, w = ^ , s* = i./2nlog(^), and t = ^^i^. 

2. Initialize a 0, m 0, and 7-^0. 

3. Repeat the following until a = t: 

(a) Perform a random walk of length w down the {0, l}** lattice from a uniformly chosen point 
V with a cut-off at n/2 — s* — 1, and let u denote the endpoint of the walk. 

(b) Iffiu) / f{v) then ai — a + 1. 

(c) m m + l 

4. / ^ ^ • ^ 

w m 

5. Return I. 

Figure 1 : The algorithm for approximating the influence of a function /. 

In what follows we assume for simplicity that /[/] > 1. As we discuss subsequently, this assumption 
can be easily replaced by /[/] > c for any constant c > 0, or even /[/] > n~^, by performing a slight 
modification in the setting of the parameters of the algorithm. 

Theorem 3.1 For every monotone function f : {0, 1}" — t- {0, 1} such that I[f] > 1, and for every 6 > 
and e = w(ylogn/n), with probability at least 1 — 6, the output, I, of Algorithm\l\satisfies: 

(l-e)-/[/]</<(l + 6)-/[/]. 

Furthermore, with probability at least 1 — 5, the number of queries performed by the algorithm is 

p. ( log(l/5) v^log(n/€) \ 

^l^^ w\ — )■ 
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We note that the (probabilistic) bound on the number of queries performed by the algorithm implies that 
the expected query complexity of the algorithm is O (^ ^"^^^^^ ■ ^^/[/'"^'^'' ^ Furthermore, the probability 
that the algorithm performs a number of queries that is more than k times the expected value decreases 
exponentially with k. 

The next definition is central to our analysis. 

Definition 1 For a ( monotone ) Boolean function f and integers w and s*, let Pw,s*if) denote the probability 
that a random walk of length w down the Boolean lattice, from a uniformly selected point and with a cut-off 
at n/2 — s* — 1, starts from f{v) = 1 and reaches f{u) = 0. 

Given the definition of Pw,s* if), we next state and prove the main lemma on which the proof of Theo- 
rem |3?I]is based. 



8,/2 log(^) 

Lemma 3.2 Let f satisfy I[f] > 1, let e > satisfy e > ^ ^ ^ , and denote e = e/4. For any 
w < , ^ and for s* = i-v/n • •\/21off(^) we have that 

- 16^21og(^) 2V Y e ^ 

(1 - e/2) • - • /[/] < < (1 + 6/2) • - • /[/] . 

n n 

Proof: For a point y G {0,1}", let/i(y) denote its Hamming weight (which we also refer to as the level 

in the Boolean lattice that it belongs to). By the choice of s* = iy^.^/2Tog(^y, and since /[/] > 1, 

the number of points y for which h{y) > n/2 + s* or h{y) < n/2 — s*, is upper bounded by 2" • 
Each such point y is incident to n edges, and each edge has two endpoints. It follows that there are at most 
2n-i . edges {y,x) for which h{y) > n/2 + s* or h{y) < n/2 — s*. Recall that an influential edge 
{y, x) for h{y) = h{x) + 1, is an edge that satisfies f{y) = 1 and f{x) = 0. Let eg* (/) denote the number 
of influential edges {y, x) such that n/2 — s* < h{x),h{y) < n/2 + s*. Since the total number of influential 
edges is 2"~^/[/], we have that 

(1 - e) • 2"-i/[/] < e,.(/) < 2"-i/[/]. (1) 

Consider any influential edge (y, x) where h{y) = £ and i > n/2 — s*. We are interested in obtaining 
bounds on the probability that a random walk of length w (where w < , „ ) down the lattice, 

^ - 16^21og(^) 

Starting from a uniformly selected point v G {0, 1}", and with a cut-off at n/2 — s* — 1, passes thi^ough 
(y, x). First, there is the event that v = y and the edge (y, x) was selected in the first step of the walk. This 
event occurs with probability 2~"' ■ |. Next there is the event that v is at distance 1 from y (and above it, 
that is, h{v) = h{y) + 1 = ^ + 1), and the edges , y) and (y, x) are selected. This occurs with probability 
2~" • {n — €) ■ jjp^ • J. In general, for every 1 < i < w — 1 vje. have {n — €) ■ ■ ■ [n — £ — i + \) pairs 
(f , P) where v >~ y and 'w{v) = 1 + i, and where P is a path down the lattice from v to y. The probability 
of selecting v as the starting vertex is 2~" and the probability of taking the path P from v is (^(^J^^y.^^(^_^l~^ ■ 
Therefore, the probability that the random walk passes through (y, x) is: 

l)---{n-l-i + l)\ 1 / ^ V-^ TT ^ 




(£ + i)---(£ + i) ; " ^ V ' + ^ 



5 



Let I = n/2 + s (where s may be negative), and denote t(£, i, j) *== . Then 

(p ■ ■\ n/2- s- j 2s + i 

t{i, l,J)= — ; = 1 ; . (3) 

n/2 + s + i — j n/2 + s + i—j 

Consider first the case that £ > n/2, i.e i = n/2 + s (s > 0). In that case it is clear that r(£, < 1 (since 
j < i), so nj=o ''"(^' ^' upper bounded by 1. In order to lower bound nj=o ''"(^' ^' '^'^'•^ 

r(^,z,j)>l — = 1-^ L_ (4) 

n/2 n 

Thus, for s < s* we have 

i-l i-1 



i=0 j=0 



> ( 1 ^ j (since i < w) 

^ ^ 2(2s + i(;)w 

~ n 
Qs* w 

> 1 (2s + w > 3s* since s < s* and w < s*) 

n 

= 1 — — (by the definitions of s* and w) 

16 

> l-e/2. (5) 



Therefore, we have that for n/2 < i < n/2 + s*, 



i-l 



and for ^ > n/2 + s* it holds that 

i-l 

n 

£ + i - i 



nn — £ — 1 



j=0 



We turn to the case where n/2 — s* < i < n/2. Here we have 

, „ , 2s — i 2w Aw 

T{i,i,j) = 1 + — — : > 1 — > 1 (8) 

n/2 — s + i — j n — 2w n 

where the last inequality follows from the fact that w < n/4. Thus, 

Urii,^,J) > (l-'-^Y > 1-^ = /'^^ ] > l-iV2 > 1-.72. (9) 
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On the other hand, 



rii,^,J) = 1+ ' . < 1 + ^1^ < 1 + —, (10) 

n/Z — s + i — j n/Z — s n 

where the last inequahty holds since n > 2s. Thus, we have 

T\r{^:h]) < [1 + — ] < 1 + = l + ~e 2. (11) 

3=0 ^ ^ 

where the second inequality follows from the inequality (1 + a)^ < 1 + 2ak which holds for a < l/{2k); 
Indeed, in our case 8s* /n < l/{2w) (this is equivalent to w < n/l6s* which holds given our setting of s* 
and the upper bound on w). 

We therefore have that for n/2 - s* < £ < n/2, 

-r-r n — I — ] 

1 - e/2 < n , , . < 1 + e 2 . (12) 

Combining Equations Q and (fT2l ). we have that forn/2 — s* < £ < n/2 + s* , 

-TT n — £ — ] 

1 - e/2 < n , , . < 1 + e 2 . (13) 

Now, we are interested in summing up the probability, over all random walks, that the walk passes through an 
influential edge. Since the function is monotone, every random walk passes through at most one influential 
edge, so the sets of random walks that correspond to different influential edges are disjoint (that is, the event 
that a walk passes through an influential edge {y, x) is disjoint from the event that it passes through another 
influential edge {y', x')). Since the edges that contribute to Pw,s*{f ) are all from levels £ > n/2 — s* (and 
since there are 2"~^/[/] influential edges in total), by Equations ^ and (fT3] ) we have 

Pn^Af) < 2"-'/[/]2-"-^^y^(^l + £(1 + 672)^ (14) 

< -I[f]-—^ w(l + e/2) (15) 

- 2 ^■'n/2-s* ^ ' ^ 

< !/[/]. + + 6/2) (16) 
2 n 

< '-1^.(1 + 2~e) (17) 

n 

= M:^(l + e/2), (18) 
n 



8,/21og(^) 

where Equation (1161 ) follows from the definition of s* , the premise of the lemma that e > ^ ^ " and 
e = e/4. 
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For lower bounding Pw,s* if), we will consider only the contribution of the influential edges that belong to 
levels £ < n/2 + s*. Consequently, Equations ^ and (fT3] ) give in total 



Pv 



.(/) > 2"-i(l-6)/[/]2-'^-^^^^(^l + ^V-e72)^ (19) 

> ^^[/Kl - e>(l - ^/2) • (20) 

> }-I[f].w{l-e){l-e/2).-{l-~e) (21) 
2 n 

> '-1^(1 -2i) (22) 

n 

= '-^il-e/2), (23) 
n 

8,/2 log{^) 

where Equation (|2ll) follows from the definition of s* , the premise of the lemma that e > ^ and 
e = e/4. 

Equations dTSl ) and ( |23l ) give 

(1 _ e/2) . ^ . /[/] < < (1 + e/2) • - • /[/] , (24) 

n n 

as claimed in the Lemma. ■ 

Proof of Theorem l3.lt For w and s* as set by the algorithm, let Pw.s* if) be as in Definition[T] where we 
shall use the shorthand p{f). Recall that m is a random variable denoting the number of iterations performed 
by the algorithm until it stops (once a = t). Let m = rhi = jytT/I)' ^^'^ "^2 = {1^/4) ■ 
that an iteration of the algorithm is successful if the walk taken in that iteration passes through an influential 
edge (so that the value of a is increased by 1). Let p{f) = ^ denote the fraction of successful iterations. 
Suppose that rhi < m < 7712. In such a case, 

(1 - e/4) • p{f) < p{f) < (1 + e/4)p(/) (25) 

since p{f) = ^ = ^^S^^- By the definition of the algorithm, / = | • ^ = ;| • p{f) so by Lemma|l2] 
(recall that by the premise of the theorem, e = u{^J\ognJn)) we have 

(l-6)/[/] < (l-6/2)(l-e/4)/[/] < / < (l + 6/4)(l + e/2)/[/] < (1 + (26) 

and thus (assuming mi < m < 7712), the output of the algorithm provides the estimation we are looking for. 

It remains to prove that fhi < m < 7TI2 with probability at least 1 — 5. Let Xj denote the indicator 
random variable whose value is 1 if and only if the i^^ iteration of the algorithm was successful, and let 
X = Yl^i -^i- By the definition of Xi, we have that E[Xj] = p{f), and so (by the definition of rhi and m) 
we have that E[X] = rhi • p{f) = i+^/i by applying the multiplicative Chemoff bound, 

Pr[m < mi] = Pr[X > t] = Pt[X > (1 + e/4)E[X]] < exp ( -- (^Y ^-—] < exp ( -—^ 



3 V4/ l + e/4y - V 96 y 

(27) 
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Thus, for t = ^^'"/^^ we have that Pr[m < rhi] < |. By an analogous argument we get that Pr[m > 
"12] < |> and so rhi < m < 1112 with probability at least 1 — 6, as desired. 

Since we have shown that m < m2 with probability at least 1 — 5, and the query complexity of the 
algorithm is 0{m), we have that, with probability at least 1 — 5, the query complexity is upper bounded by 

as required. ■ 



Remark. We assumed that /[/] > 1 only for the sake of technical simplicity. This assumption can be 
replaced with /[/] > for any constant c > 0, and the only modifications needed in the algorithm and 

its analysis are the following. The level of the cutoff s* should be set to s* = \Jn/2 ■ \J^'^z{^^^^ = 

\^/nyj2c\og{2n) + log(l/e) (which is a constant factor larger than the current setting), and the length w 

of the walks in the algorithm should be set to w = , (which is a constant factor smaller than 

16^21og(^) 

the cuiTcnt setting). 

The first modification follows from the fact that the number of points y whose Hamming weight h{y) is 
at least n/2 + r • \/n/2 or at most n/2 — r ■ \/n/2 is upper bounded by 2" • 2e~'" . This implies that the 
number of edges (y, x) (where h{y) = h{x) + l) such that h{y) > n/2+r- \Jnj2 or h{y) < n/2 — r- yri/2 
is upper bounded by n • 2" • 2~^'^. Requiring that the latter is no more than e • /[/]2"~^ > e • n~'^2"^^ (i.e, 
e-fraction of the total number of influential edges), yields the desired r, where s* = r^nl2. The second 
modification, i.e, in the length of the walk, is governed by the choice of s* , since, by the analysis, their 
product should be bounded by 0{en). Since in both expressions !//[/] = appears only inside a log 
term, this translates only to constant factor increase. 

We note that the lower bound we give in Section |4] applies only to functions with (at least) constant 
influence, and so in the above case where /[/] = l/poly(n), the tightness of the algorithm (in terms of 
query complexity) is not guaranteed. 



4 A Lower Bound 

In this section we prove a lower bound of (^tjjj^^^ on the query complexity of approximating the 
influence of monotone functions. Following it we explain how a related construction gives a lower bound 
of ft (^tIjj) approximating the influence of general functions. The idea for the first lower bound is the 

following. We show that any algorithm that performs o (^ jyj^gn ) ISeries cannot distinguish with constant 
success probability between that following: (1) A certain threshold function (over a relatively small number 
of variables), and (2) A function selected uniformly at random from a certain family of functions that have 
significantly higher influence than the thi^eshold function. The functions in this family can be viewed as 
"hiding their influence behind the threshold function". More precise details follow. 

We first introduce one more notation. For any integer 1 < k < n and < t < k, let rj^ : {0,1}" — t- 
{0, 1} be the t-threshold function over xi, . . . , Xk- That is, t^,(^) = 1 if and only if Yl\=i — Observe 
that (since for every 1 < i < A; we have that /i[T^] = 2~'^ • 2 • {'izl) while for i > /c we have that /jfr^] = 0), 
I[ri] = k.2^i^~^).{ll). 
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The above observation implies that for every sufficiently lai^ge k {k > 2 log n suffices), there exists a 
setting of t < k/2, which we denote by t{k, 1), such that /[t^,^'"'^^] = 1 — o(l) (where the o(l) is with 
respect to k). This setting satisfies = Q{2^/k) (so that I) = k/2- Q{yJk\og k)). 

Theorem 4.1 For every I* such that 2 < I* < y/n/ log n, there exists a family of monotone functions Fj^ 
such that /[/] > I* for every f G Fj*, but any algorithm that distinguishes with probability at least 2/3 

between a uniformly selected function in Fj* and rjj^'^^ for k = 2 log n, must perform $7 (^jrq^^^ queries. 

In particular, considering I* = c for any constant c > 2, we get that every algorithm for approximating 
the influence to within a multiplicative factor of ^/c must perform Q{^/n) queries. If we increase the 
lower bound on the influence, then the lower bound on the complexity of the algorithm decreases, but the 
approximation factor (for which the lower bound holds), increases. We note that the functions for which the 
lower bound construction hold are not balanced, but we can easily make them very close to balanced without 
any substantial change in the argument (by "ORing" t^'''^'^^ as well as every function in Fj* with xi). We 
also note that for /* = Q.{^yiogn) we can slightly improve the lower bound on approximating the influence 

to i7 ( . ^ ) (for a slightly smaller approximation factor). We address this issue following the 

\I*-y/log{y^/I*) J 

proof. 

Proof: For A; = 21ogn and for any < t < A:, let Lj, {x G {0, 1}'' : Yli=i = We shah also 
use the shorthand i for t{k, 1). Fixing a choice of /*, each function in Fj* is defined by a subset R of L^. 
where \R\ = /3(/*) • 2^^ for /3(/*) that is set subsequently. We denote the corresponding function by //j and 
define it as follows: For every x G {0, 1}", if xi . . . xj^ ^ R, then fR{x) = t^(x), and if xi . . . G i?, then 
fnix) = maj'„_^(x), where m.ai'^_f^{x) = 1 if and only if Y17=k+i Xi> {n — k)/2. By this definition, for 
every /r G Fi* 

/[/i?]>/3(r)-/[majU]- (29) 



If we take to be = /*//[maj^_^J = cl* j\Jn — k (for c that is roug hly ^7172), then in Fi* 

every function has influence at least /*. Since /?(/*) is upper bounded by |L^|/2''\ which is of the order of 
A;/2'^ = 21ogn/2''\ this construction is applicable to /* = 0(-^/n/ log n). 

Consider an algorithm that needs to distinguish between t\, and a uniformly selected /r G Fi* . Clearly, 
as long as the algorithm doesn't perform a query on x such that xi . . . x^ G R, the value returned by 
is the same as that of r^. But since R is selected uniformly in L\, as long as the algorithm performs less 

than ^, ^fjx-^ queries (where c' is some sufficiently large constant), with high constant probability (over 

\it\ ( /~ \ 

the choice of i?), it won't "hit" a point in R. Since ^, ^^^^^ = ( \ogn-i* ) ' theorem follows. ■ 



In order to get the aforementioned slightly higher lower bound for /* = ^{yjlog n), we modify the 
settings in the proof of Theorem 14.11 in the following manner. We set k = \og{^/n/ 1*) and t = k/2 

k 12 

(so that the "low influence" function is simply a majority function over k variables, ). For the "high 

influence" function, we let R consist of a single point x in ll]!'^ , where for each R = {x} we have a 
different function in Fj* (as defined in the proof of Theorem 14. lb . It follows that for each such R, Ilfn] = 
(1 - o{l))Vk + ^\/^r^ > /*, while /[rf/^] ^ Vk = 0(Vbg7^). By the same argument as in the 

proof of Theorem 14. 1[ if the algorithm preforms less than — r^i — = -r-j^ = / " queries (for 



10 



small enough d), with high probability it won't "hit" x, and thus will not be able to distinguish between a 
randomly selected function f G Jr (where the randomness is over the choice of x G L^/ ) and . 

A lower bound of r2(n//[/]) for general functions. We note that for general (not necessarily monotone) 
functions, there is a lower bound of Q{n/I[f]) on estimating the influence, which implies that it is not 
possible in general to improve on the simple edge-sampling approach (in terms of the dependence on n and 
/[/]). Similarly to what we showed in the case of monotone functions, we show that for every /* > 2, it 
is hard to distinguish between the dictatorship function f{x) = xi (for which /[/] = 1) and a uniformly 
selected function in a family Fj* of functions, where every function in F/* has influence at least /* . 

Similarly to the construction in the proof of Theorem 14. 1[ we consider the first k variables, where here 
k = log n. Fixing /* (where /* = o(n) or else the lower bound is trivial), each function in Fj* is defined by a 
subset R of {0, l}'^ such that \R\ = I*. We denote the corresponding function by //? and define it as follows: 
For every x G {0, 1}", if xi . . . x^ ^ R then = xi, and if xi . . . G i?, we let /i?(x) = ^^^^j^i Xj. 

By this definition (since 2^ = n), for every Jr G Fj* /[/r] > (1 - 21* /n) + (r/n) ■ {n - k) > I*. The 
argument for establishing that it is hard to distinguish between f{x) = Xj and a uniformly selected function 
in Fi* is essentially the same as in the proof of Theorem 14. II 
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